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ABSTRACT 

This  paper  investigates  the  effects  of  spatial  filtering  on  the  ensemble-based  estimate  of  the  background 
error  covariance  matrix  in  an  ensemble-based  Kalman  filter  (EnKF).  In  particular,  a  novel  kernel  smoothing 
method  with  variable  bandwidth  is  introduced  and  its  performance  is  compared  to  that  of  the  widely  used 
Gaspari-Cohn  filter,  which  uses  a  fifth-order  kernel  function  with  a  fixed  localization  length.  Numerical 
experiments  are  carried  out  with  the  40-variable  Lorenz-96  model.  The  results  of  the  experiments  show  that 
the  nonparametric  approach  provides  a  more  accurate  estimate  of  the  background  error  covariance  matrix 
than  the  Gaspari-Cohn  filter  with  any  localization  length.  It  is  also  shown  that  the  Gaspari-Cohn  filter  tends 
to  provide  more  accurate  estimates  of  the  covariance  with  shorter  localization  lengths.  However,  the  analyses 
obtained  by  using  longer  localization  lengths  tend  to  be  more  accurate  than  those  produced  by  using  short 
localization  lengths  or  the  nonparametric  approach.  This  seemingly  paradoxical  result  is  explained  by 
showing  that  localization  with  longer  localization  lengths  produces  filtered  estimates  whose  time  mean  is  the 
most  similar  to  the  time  mean  of  both  the  unfiltered  estimate  and  the  true  covariance.  This  result  suggests  that 
a  better  metric  of  covariance  filtering  skill  would  be  one  that  combined  a  measure  of  closeness  to  the  sample 
covariance  matrix  for  a  very  large  ensemble  with  a  measure  of  similarity  between  the  climatological  averages 
of  the  filtered  and  sample  covariance. 


1.  Introduction 

Atmospheric  data  assimilation  is  the  process  of  esti¬ 
mating  the  spatiotemporally  evolving  state  of  the  at¬ 
mosphere  based  on  observations.  The  resulting  state 
estimate  at  a  given  time  is  called  analysis.  Modern  data 
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assimilation  algorithms  obtain  the  analysis  by  a  statistical 
interpolation  process:  the  analysis  is  computed  by  updat¬ 
ing  an  a  priori  estimate  of  the  state,  called  background, 
based  on  the  observed  information  assuming  that  the 
background  and  observation  errors  are  random  variables 
with  known  statistical  parameters  (Daley  1991;  Kalnay 
2003).  In  particular,  the  data  assimilation  schemes,  which 
are  able  to  handle  the  large  number  of  state  variables  and 
observations  in  a  realistic  operational  or  research  appli¬ 
cation,  assume  that  the  probability  distribution  of  the 
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Public  reporting  burden  for  the  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources,  gathering  and 
maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this  collection  of  information, 
including  suggestions  for  reducing  this  burden,  to  Washington  Headquarters  Services,  Directorate  for  Information  Operations  and  Reports,  1215  Jefferson  Davis  Highway,  Suite  1204,  Arlington 
VA  22202-4302.  Respondents  should  be  aware  that  notwithstanding  any  other  provision  of  law,  no  person  shall  be  subject  to  a  penalty  for  failing  to  comply  with  a  collection  of  information  if  it 
does  not  display  a  currently  valid  OMB  control  number. 


1.  REPORT  DATE 

MAR  2011 


2.  REPORT  TYPE 


4.  TITLE  AND  SUBTITLE 

A  Statistical  Investigation  of  the  Sensitivity  of  Ensemble-Based  Kalman 
Filters  to  Covariance  Filtering 

6.  AUTHOR(S) 


7.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES) 

Texas  A&M  University, Department  of  Statistics, College 
Station, TX, 77843-3143 

9.  SPONSORING/MONITORING  AGENCY  NAME(S)  AND  ADDRESS(ES) 


3.  DATES  COVERED 

00-00-2011  to  00-00-2011 

5a.  CONTRACT  NUMBER 

5b.  GRANT  NUMBER 

5c.  PROGRAM  ELEMENT  NUMBER 

5d.  PROJECT  NUMBER 

5e.  TASK  NUMBER 

5f.  WORK  UNIT  NUMBER 

8.  PERFORMING  ORGANIZATION 
REPORT  NUMBER 


10.  SPONSOR/MONITOR'S  ACRONYM(S) 

11.  SPONSOR/MONITOR'S  REPORT 
NUMBER(S) 


12.  DISTRIBUTION/AVAILABILITY  STATEMENT 

Approved  for  public  release;  distribution  unlimited 

13.  SUPPLEMENTARY  NOTES 

14.  ABSTRACT 

This  paper  investigates  the  effects  of  spatial  filtering  on  the  ensemble-based  estimate  of  the  background 
error  covariance  matrix  in  an  ensemble-based  Kalman  filter  (EnKF).  In  particular,  a  novel  kernel 
smoothing  method  with  variable  bandwidth  is  introduced  and  its  performance  is  compared  to  that  of  the 
widely  used  Gaspari?Cohn  filter,  which  uses  a  fifth-order  kernel  function  with  a  fixed  localization  length. 
Numerical  experiments  are  carried  out  with  the  40-variable  Lorenz-96  model.  The  results  of  the 
experiments  show  that  the  nonparametric  approach  provides  a  more  accurate  estimate  of  the  background 
error  covariance  matrix  than  the  Gaspari?Cohn  filter  with  any  localization  length.  It  is  also  shown  that  the 
Gaspari?Cohn  filter  tends  to  provide  more  accurate  estimates  of  the  covariance  with  shorter  localization 
lengths.  However,  the  analyses  obtained  by  using  longer  localization  lengths  tend  to  be  more  accurate  than 
those  produced  by  using  short  localization  lengths  or  the  nonparametric  approach.  This  seemingly 
paradoxical  result  is  explained  by  showing  that  localization  with  longer  localization  lengths  produces 
filtered  estimates  whose  time  mean  is  the  most  similar  to  the  time  mean  of  both  the  unfiltered  estimate  and 
the  true  covariance.  This  result  suggests  that  a  better  metric  of  covariance  filtering  skill  would  be  one  that 
combined  a  measure  of  closeness  to  the  sample  covariance  matrix  for  a  very  large  ensemble  with  a  measure 
of  similarity  between  the  climatological  averages  of  the  filtered  and  sample  covariance. 

15.  SUBJECT  TERMS 


16.  SECURITY  CLASSIFICATION  OF: 

17.  LIMITATION  OF 

18.  NUMBER 

19a.  NAME  OF 

ABSTRACT 

OF  PAGES 

RESPONSIBLE  PERSON 

a.  REPORT 

unclassified 

b.  ABSTRACT 

unclassified 

c.  THIS  PAGE 

unclassified 

Same  as 
Report  (SAR) 

16 

Standard  Form  298  (Rev.  8-98) 

Prescribed  by  ANSI  Std  Z39-18 


September  2011 


JUN  ET  AL. 


3037 


background  and  observation  errors  is  a  multivariate  nor¬ 
mal  distribution  with  a  known  covariance  matrix.  The 
focus  of  this  paper  is  on  the  estimation  of  the  covariance 
matrix  that  describes  the  distribution  of  the  background 
error.  This  matrix  is  called  the  background  error  co- 
variance  matrix  and  we  denote  it  by  P  . 

Sequential  data  assimilation  schemes  use  a  short-term 
forecast  started  from  the  analysis  of  the  previous  analysis 
time  as  background.  Thus,  Pb  is  an  M  X  M  matrix,  where 
M  is  the  number  of  gridpoint  variables  in  the  model.  The 

entries  pbv(u ,  v  =  1 . M)  of  Pb  describe  the  presumed 

covariance  between  the  errors  sb  and  eb  in  the  xb  and 
xb  components  of  the  background  xb.  Each  diagonal  el¬ 
ement  pbm(ii  =  1, . . .  ,  M)  represents  the  variance  of  the 
error  for  a  scalar  model  variable  (e.g.,  temperature,  sur¬ 
face  pressure,  a  component  of  the  wind  vector)  at  a  given 
model  grid  point,  while  the  off-diagonal  elements  repre¬ 
sent  the  covariance  between  the  errors  in  the  estimate 
of  the  same  state  variable  at  different  locations  and  be¬ 
tween  the  errors  in  the  estimate  of  the  different  state 
variables  (e.g.,  temperature  and  meridional  component 
of  the  wind  vector)  at  the  same  or  different  locations. 
Hence,  a  physical  distance,  denoted  by  |zz  —  u|,  can  be 
defined  for  each  covariance  pbw  with  the  distance  between 
the  physical  locations  of  the  two  state  vector  components 
for  which  pbv  describes  the  statistical  relationship  be¬ 
tween  the  state  estimation  errors.  (For  the  diagonal  en¬ 
tries  and  the  entries  that  describe  the  covariance  between 
variables  at  the  same  location,  | u  —  u|  =  0.) 

An  estimate  of  Pb  is  usually  obtained  in  two  steps;  first 
computing  a  sample  covariance  matrix 
K 

P6=-i-Xxfo«[XfeW]T,  (1) 

A  -  1  k=i 

where  {Xb{k):k=  1,  ...  ,  X}  is  a  X-member  sample  of 
the  background  error;  then  filtering  the  sampling  noise 
by  a  statistical  postprocessing  of  Pb  (e.g.,  Berre  and 
Desroziers  2010).  In  this  paper,  we  restrict  our  attention 
to  the  estimation  of  Pfc  in  ensemble-based  Kalman  filter 
(EnKF)  schemes.  In  such  schemes,  {X,,(/c):  k  =  I ,  ...  ,  X} 
is  obtained  by  first  integrating  the  model  from  a 
X-member  ensemble  of  analyses  for  the  previous 
analysis  time  to  obtain  a  background  ensemble  (X^: 
k  =  1,  . . .  ,  X};  then,  taking  the  difference  between  the 
members  of  the  background  ensemble  and  the  back¬ 
ground  mean 

K 

(2) 

x  k=l 

to  obtain  (XftW  =  xh«  -  xh:  k  =  1, . . . ,  X}. 

Because  of  the  large  computational  expense  asso¬ 
ciated  with  an  ensemble  of  model  integrations,  the 


computationally  affordable  sample  size  is  limited;  for 
example,  the  typical  value  of  X  in  a  practical  appli¬ 
cation  is  between  20  and  100  (e.g.,  Houtekamer  and 
Mitchell  2005;  Hamill  2006).  The  small  ensemble  size 
X  poses  two  important  challenges  for  the  statistical 
postprocessing  algorithm  that  provides  the  final  estimate 
Pb  of  Pb  based  on  Pft.  First,  statistical  fluctuations  in  a 
small  ensemble  {XZ,(A):  k  =  I ,  ...  ,  X}  lead  to  a  low  sig- 
nal-to-noise  ratio  for  those  entries  of  Pft  that  estimate 
small  entries  of  Pb ,  which  makes  filtering  the  sample 
noise  challenging.  This  problem  typically  occurs  for  en¬ 
tries  pbw  for  which  the  distance  \u  —  u|  is  large.  Second, 
Pfc  is  a  highly  rank-deficient  estimate  of  Ph  (X  <C  M ), 
because  the  typical  dimension  of  the  state  vector  in 
a  state-of-the-art  atmospheric  model  is  M  =  10-108. 

Practical  implementations  of  the  EnKF  address  the 
aforementioned  two  problems  by  applying  a  physical- 
distance-dependent  stationary  (time  independent)  filter 
function  to  the  sample  covariances  (e.g.,  Houtekamer 
and  Mitchell  1998,  2001;  Hamill  et  al.  2001;  Anderson 
2001;  Ott  et  al.  2004).  This  approach  is  called  localiza¬ 
tion ,  because  it  forces  the  covariance  to  zero  beyond 
a  prescribed  distance  d.  Some  localization  functions 
do  not  change  the  sample-based  estimate  pbw  of  the 
covariance  when  the  distance  \u  —  u|  associated  with 
the  pair  of  state  vector  components  is  smaller  than  d, 
but  replaces  pbw  with  zero  when  \u  —  u|  >  d  (e.g., 
(Houtekamer  and  Mitchell  1998;  Ott  et  al.  2004; 
Szunyogh  et  al.  2005,  2008;  Hunt  et  al.  2007);  other  lo¬ 
calization  functions  taper  the  covariance  to  zero  grad¬ 
ually  with  increasing  distance  \u  —  v\  (e.g.,  Houtekamer 
and  Mitchell  2001;  Hamill  et  al.  2001).  Such  tapering 
functions  modify  all  entries  of  the  sample  covariance 
matrix,  except  for  the  diagonal  elements.  Experience 
with  the  different  EnKF  algorithms  and  localization 
strategies  suggests  that  tapering  greatly  increases  the 
accuracy  of  the  analyses,  especially  when  the  size  of  the 
ensemble  is  small  (e.g.,  Houtekamer  and  Mitchell  2005; 
Hamill  2006)  or  in  the  presence  of  model  error  (e.g., 
Zhang  et  al.  2009b). 

While  there  exist  dynamical  arguments  to  support 
localization  (Yoon  et  al.  2010)  and  it  may  formally  solve 
the  problems  it  is  designed  to  address,1  the  particular 
shape  of  the  tapering  function  is  typically  selected  based 
on  intuition.  Moreover,  if  the  true  covariance  struc¬ 
ture  of  the  system  is  complex  and  does  not  decrease 
monotonically  with  distance,  the  localized  sample  co- 
variance  may  not  be  a  good  fit  to  the  true  covariance 


1  Localization  also  makes  the  data  assimilation  algorithms  more 
suitable  for  implementation  on  massively  parallel  computer  ar¬ 
chitectures. 


3038 


MONTHLY  WEATHER  REVIEW 


Volume  139 


structure.  The  ad  hoc  nature  of  most  localization  algo¬ 
rithms  used  in  the  literature  motivates  us  to  start  ex¬ 
ploring  the  effects  of  covariance  filtering  on  the 
performance  of  the  EnKF  in  a  more  systematic  way.  In 
particular,  we  compare  the  results  obtained  with  local¬ 
ization  to  the  results  obtained  with  an  adaptive  non¬ 
parametric  method  to  estimate  the  entries  of  P  .  The 
property  that  makes  such  a  nonparametric  method 
particularly  suitable  for  our  study  is  that,  instead  of  an  ad 
hoc  choice,  the  shape  of  the  localization  function  and  the 
distance  at  which  it  becomes  zero  is  determined  from  the 
sample  itself.  We  are  aware  of  only  two  earlier  attempts 
at  adaptively  determining  the  localization  distance:  the 
hierarchical  filter  of  Anderson  (2007)  estimates  the  lo¬ 
calization  function  using  an  ensemble  of  ensembles, 
while  the  adaptive  localization  approach  in  Bishop  and 
Hodyss  (2007)  and  the  ensemble  correlations  raised  to 
a  power  (ECO-RAP)  approach  of  Bishop  and  Hodyss 
(2009a,b)  propagate  and  adapt  the  width  of  the  lo¬ 
calization  by  computing  powers  of  the  sample  correla¬ 
tions.  Zhang  et  al.  (2009a),  on  the  other  hand,  proposed 
a  successive  covariance  localization  (SCL)  method  that 
uses  several  different  localization  distances  to  account 
for  different  physical  scales  in  the  background  error 
covariance. 

The  rest  of  the  paper  is  organized  as  follows.  Section  2 
is  a  formal  description  of  covariance  filtering  in  EnKF. 
Section  3  explains  the  design  of  the  data  assimilation 
runs  that  provide  the  data  for  our  quantitative  inves¬ 
tigations.  This  section  includes  a  brief  description  of 
the  particular  implementation  of  the  serial  square  root 
EnKF  scheme  of  Whitaker  and  Hamill  (2002)  on  the 
Lorenz-96  model  (Lorenz  1996;  Lorenz  and  Emanuel 
1998),  which  we  use  to  carry  out  all  data  assimilation 
experiments  described  in  this  paper.  In  section  4,  we 
explain  the  nonparametric  statistical  method  we  use  to 
estimate  the  covariance,  while  in  section  5  we  compare 
the  performance  of  a  standard  localization  method  to 
that  of  the  nonparametric  scheme  in  estimating  the 
background  covariance.  Then,  in  section  6,  we  compare 
the  performance  of  the  different  filtering  strategies  by 
numerical  experiments.  Our  conclusions  are  drawn  in 
section  7. 

2.  Covariance  filtering  in  EnKF 

In  this  section,  we  illustrate  the  process  of  estimating 
P,>  in  an  ensemble-based  data  assimilation  system  with 
the  help  of  the  algorithm  introduced  in  Whitaker  and 
Hamill  (2002).  We  choose  this  particular  algorithm  be¬ 
cause  it  allows  for  a  straightforward  implementation  of 
distance-dependent  filtering  of  the  covariance.  It  also 
performs  well  for  small  ensemble  sizes,  which  makes 


it  an  ideal  scheme  for  testing  the  effects  of  distance- 
dependent  filtering.  Later  in  the  paper,  we  use  an  im¬ 
plementation  of  this  algorithm  on  the  Lorenz-96  model. 
While  all  matrices,  vectors,  and  scalars  described  in  this 
section  are  dependent  on  the  analysis  time,  we  do  not 
indicate  this  time  dependence  in  our  notation,  as  all 
operations  are  carried  out  at  the  same  analysis  time. 

a.  The  EnKF  algorithm  of  Whitaker  and 
Hamill  (2002) 

In  the  extended  Kalman  filter  (Jazwinski  1970),  the 
background  is  updated  with  the  equation 

xa  =  xb  +  K[y°  —  7f(x6)],  (3) 

where  the  Kalman-gain  matrix  K  is  defined  by 

K  =  PhHT(HP,,HT+R)  ‘.  (4) 

In  Eq.  (3),  the  A/- vector  x"  is  the  analysis;  'H(-)  is  the 
observation  operator,  that  is,  H(xb)  is  the  prediction 
of  the  observations  based  on  the  background;  H  is  the 
matrix  that  represents  the  linearization  of  H (•)  about  xh\ 
and  R  is  the  observation  error  covariance  matrix. 

In  an  EnKF,  xb  is  replaced  with  the  ensemble  mean 
x6  and  the  result  of  Eq.  (3)  is  the  ensemble  mean 
analysis  xa.  In  addition  to  the  ensemble  mean  analysis, 
an  EnKF  also  computes  the  ensemble  of  analysis 
perturbations  {X0^:  k  =  1, . . . ,  K},  which  then  can 
be  added  to  the  analysis  mean  to  obtain  the  ensemble 
of  analyses,  (xflW  =  x“  +  Xa(^:  k  =  \, ...  ,  K).  EnKF 
schemes  generate  the  ensemble  of  analysis  pertur¬ 
bations,  {Xfl^):  k  =  1, . . . ,  K},  such  that  they  satisfy 
the  condition 

K 

P"=  — i-  Xxfl(i>[X«]T,  where  (5) 
K~  1  k= l 

Pfl  =  (I  -  KH)Pfa  (6) 

is  the  ensemble-based  estimate  of  the  analysis  error  co- 
variance  matrix.2 

The  algorithm  of  Whitaker  and  Hamill  (2002)  is  a 
serial  algorithm,  that  is,  the  observations  are  assimi¬ 
lated  one  by  one,  serially  updating  the  mean  analysis 
xa  and  the  ensemble  of  analysis  perturbation  (Xfl^: 
k  =  1,  ,  K}.  To  simplify  the  notation,  we  describe 

the  algorithm  for  the  case  when  each  model  variable  is 


2  The  set  of  analysis  perturbations  that  satisfies  this  condition  is 
not  uniquely  defined. 
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directly  observed  by  a  single  observation3 *  and  the  er¬ 
rors  in  the  observations  of  the  different  components  of 
the  state  vector  are  uncorrelated.  We  index  each  ob¬ 
servation  with  the  index  of  the  variable  it  observes,  that 
is,  the  observation  of  x„  is  y°u.  For  the  particular  con¬ 
figuration  of  the  observing  network,  the  component 
Ttll(xb)  of  the  observation  operator  for  y°  is 

xbu  =  nu(xb).  (7) 

Each  observation  y°  is  assimilated  by  the 

xa  =  xb  +  ku(y°-xbu),  (8) 

equivalent  of  Eq.  (3)  for  a  single  observation,  where  the 
vector 


k„  =  P  u(Pbuu  +  ruu 


r1 


(9) 


is  the  Kalman-gain  associated  with  the  observation  of  xu. 
In  Eq.  (9),  the  components  of  the  vector  pb  are  defined 
by  the  covariance  between  the  error  e*  and  the  errors 
eb(v  =  1, . . .  ,M),  pbuu  is  the  variance  of  sbu,  and  rlm  is  the 
observation  error  variance  for  the  observation  y°.  The 
analysis  perturbations  are  updated  by 


Xa(k>  =  - Xbik) ka,  k  =  l,  ...  ,K,  where  (10) 


k  =  ak. 


(11) 


b.  Covariance  filtering 

In  the  computational  algorithm  defined  by  Eqs.  (8)- 
(11),  the  background  error  covariance  enters  in  Eq.  (9) 
through  p|j.  A  distance-dependent  filtering,  for  exam¬ 
ple,  localization,  can  be  implemented  by  filtering  the 
Piw(v  =  1,  ■  ■  •  ,M)  components  of  p*  based  on  the  dis¬ 
tance  j  u  —  u|. 

A  potential  problem  with  covariance  filtering  is  that  it 
produces  an  analysis  ensemble,  {X“^:  k  =  1, ....  K}, 
that  is  not  fully  consistent  with  the  background  error 
covariance  used  in  the  computation  of  the  Kalman  gain: 
while  the  Kalman  gain  is  computed  based  on  the  filtered 
estimate  P*  [see  Eq.  (4)],  the  Kalman  gain  is  applied 
to  the  sample  covariance  matrix  Pb  to  compute  the 
ensemble  perturbations,  (Xfl®:  k  =  1, . . . ,  K}  [by  Eq. 
(6)  in  a  square  root  filter].  While  the  effects  of  this  in¬ 
consistency  on  the  accuracy  of  the  analysis  cannot  be 


3  This  assumption  is  solely  made  to  simplify  the  notation  and  has 

no  effect  on  the  generality  of  our  results. 


predicted  based  on  strictly  theoretical  considerations, 
the  fact  that  it  exists  suggests  that  a  filtering  algorithm 
that  provides  a  more  accurate  estimate  of  P '  does  not 
necessarily  produce  a  more  accurate  analysis. 

c.  Covariance  inflation 

In  a  square  root  EnKF  scheme,  such  as  the  algorithm  of 
Hamill  and  Whitaker  (2005),  the  diagonal  entries  of  the 
sample  covariance  matrix  Ph  tend  to  underestimate  the 
diagonal  entries  of  Ph  (the  variance).  There  are  different 
ways  to  account  for  the  underestimation  of  the  variance 
in  EnKF  (e.g.,  Hamill  and  Whitaker  2005).  The  common 
feature  of  these  techniques  is  that  they  increase  the 
magnitude  of  the  background  ensemble  perturbations 
(X*(*) :  A:  =  1,  . . .  ,  K}.  The  most  popular  approach, 
which  is  also  the  one  used  in  this  paper,  is  to  multiply 
each  background  perturbations  by  a  p  >  1  variance  in¬ 
flation  factor.  This  approach  is  called  multiplicative 
variance  inflation  and  was  introduced  in  Anderson  and 
Anderson  (1999).  It  should  be  noted  that  the  multiplica¬ 
tive  variance  inflation  increases  not  only  the  diagonal 
elements  of  P6,  but  also  the  off-diagonal  ones.  Therefore, 
the  variance  inflation  also  inflates  the  covariance  esti¬ 
mates  in  the  Pz’-filtered  estimate  of  the  covariance  matrix. 


3.  Experiment  design 

In  this  section,  we  briefly  introduce  the  Lorenz-96 
model  and  our  approach  to  generate  the  time  evolving 
“true”  states  and  the  simulated  observations  of  the 
“true”  states. 

a.  The  Lorenz-96  model 

The  governing  equation  of  the  Lorenz-96  model  is  the 

=  K+i(0  -*„_2(0K-i(0  ~xv(t)  +F,  (12) 

system  of  ordinary  differential  equations  for  M  =  40 
scalar  variables  xm  v  =  1, . . . ,  M;  where  xM+1(t)  =  Xj(t), 
x_t(t)  =  xM-flt),  x0(?)  =  xM(t),  and  F  is  a  constant 
forcing  term.  While  a  partial  differential  equation  to 
which  Eq.  (12)  would  be  a  finite-dimensional  approxi¬ 
mation  is  not  known  to  exist,  the  variables  xm  v  =  1, . . . , 
40  are  usually  thought  of  as  gridpoint  values  of  a  scalar 
atmospheric  variable  along  a  latitude  circle.  Using  this 
analog,  the  time  evolution  of  the  model  for  F  =  8  re¬ 
sembles  the  propagation  of  a  wavenumber  8  dispersive 
wave  characterized  by  a  westward  (in  the  direction  of 
decreasing  v)  phase  speed  and  an  eastward  (in  the  di¬ 
rection  of  increasing  v)  group  velocity.  The  model  is 
chaotic:  it  has  13  positive  Lyapunov  exponents  and  its 
Lyapunov  dimension  is  27.1  (Lorenz  and  Emanuel 
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1998).  Thus,  in  the  Lorenz-96  model,  similar  to  the  sit¬ 
uation  in  the  storm-track  regions  in  an  NWP  model,  the 
spatiotemporal  evolution  of  the  uncertainties  is  gov¬ 
erned  by  unstable  dispersive  waves  and  accurate  es¬ 
timates  of  the  state  over  an  extended  time  period  can  be 
obtained  only  by  the  frequent  assimilation  of  observa¬ 
tions.  In  spite  of  its  skill  in  mimicking  an  important 
feature  of  the  propagation  of  state  estimation  errors  in 
realistic  models  of  the  atmosphere,  the  Lorenz-96  model 
is  a  highly  idealized  analog  of  a  realistic  atmospheric 
model.  Most  importantly,  the  spatial  correlations  be¬ 
tween  xv  at  the  different  “grid  points”  does  not  change 
smoothly  with  distance  as  in  a  realistic  model.4  We 
choose  the  Lorenz-96  model  because  (i)  its  low  dimen¬ 
sionality  allows  us  to  test  a  computationally  relatively 
expensive  nonparametric  scheme  to  filter  the  ensem¬ 
ble-based  estimate  of  the  covariance;  (ii)  filtering  the 
ensemble-based  covariance  by  localization  has  a  well- 
documented  positive  effect  on  the  accuracy  of  the  anal¬ 
yses  for  this  model  (Whitaker  and  Hamill  2002);  and 
(iii)  this  model  has  an  excellent  track  record  in  pro¬ 
viding  the  initial  test  environment  for  EnKF  schemes 
(e.g.,  Whitaker  and  Hamill  2002;  Ott  et  al.  2004;  Zhang 
et  al.  2009b),  which  later  proved  competitive  with  the 
state-of-the-art  data  assimilation  schemes  of  the  oper¬ 
ational  centers. 

b.  Generation  of  the  time  series  of  “true”  states 
and  the  observations 

We  solve  Eq.  (12)  using  a  fourth-order  Runge-Kutta 
time  integration  scheme  and  a  time  step  of  0.05  di¬ 
mensionless  time  unit.  This  time  unit  is  defined  by  the 
e-folding  time  of  the  dissipation  in  the  model  (e.g., 
Lorenz  and  Emanuel  1998).  Assuming  that  the  e-folding 
time  of  dissipation  in  the  atmosphere  is  about  5  days,  the 
time  of  0.05  dimensionless  time  unit  is  equivalent  to 
about  6  h  in  real  time.  We  carry  out  all  numerical  ex¬ 
periments  under  the  perfect  model  hypothesis,  gener¬ 
ating  the  “true”  state  space  trajectory  with  a  long  time 
integration  of  the  model.  The  initial  condition  for  this 
integration  is  obtained  by  adding  small-magnitude  ran¬ 
dom  noise  to  the  unstable  steady-state  solution  x„  =  0, 
u  =  1 ,  M ;  then,  discarding  the  initial  transient  part 
of  the  trajectory  (first  1000  time  steps)  and  defining  the 
true  states  x\t)  with  the  states  along  the  remaining 


4  This  shortcoming  of  the  model  was  successfully  corrected  by 
a  modification  of  Eq.  (12)  in  Lorenz  (2005).  Unfortunately,  this 
improvement  of  the  model  was  achieved  at  the  expense  of  in¬ 
creasing  the  number  of  variables,  which  limits  the  appeal  of  the 
improved  model  as  a  low  computational  cost  alternative  to  a  more 
realistic  atmospheric  model. 


portion  of  the  trajectory.  Simulated  observations  are 
generated  for  each  time  step  by  adding  normally  dis¬ 
tributed  random  noise  with  expectation  zero  and  stan¬ 
dard  deviation  1  to  each  variable  xu,  u  =  1, . . . ,  M.  That 
is,  the  observation  error  covariance  matrix  R  is  the 
identity  I.5  We  estimate  the  state  at  each  observation 
time  by  assimilating  the  simulated  observations  with  the 
algorithm  described  by  Eqs.  (8)— (11). 

4.  Statistical  methodology 

We  now  introduce  a  nonparametric  statistical  method 
to  estimate  the  covariance  matrix,  which  is  computa¬ 
tionally  feasible  but  alleviates  some  of  the  potential 
problems  with  the  sample  and  the  localized  sample  co- 
variance  matrices.  We  first  review  some  terminologies  in 
spatial  statistics,  then  introduce  the  nonparametric 
scheme,  gradually  relaxing  the  assumptions  we  make 
about  the  covariances.  We  illustrate  our  main  points 
with  quantitative  results  for  the  Lorenz-96  model.  In 
these  calculations,  we  use  a  sample  covariance  matrix 
based  on  a  K  =  5000  member  ensemble,  which  we  de- 

~b 

note  by  P5000,  as  a  proxy  for  the  true  covariance  matrix 
P\  We  obtain  P5000  by  running  the  EnKF  data  assimi¬ 
lation  system  for  200  analysis  steps  using  a  set  of  initial 
ensemble  members,  which  was  generated  by  adding 
Gaussian  random  noise,  with  mean  zero  and  standard 
deviation  F/10,  to  the  unstable  steady-state  solution  x„  =  0, 
u  =  1 ,M\  then,  choosing  P7,  from  the  last  analysis 
step  to  be  P5000.  For  this  calculation,  we  use  a  weak  co- 
variance  inflation  factor,  p  =  1.005,  because  the  primary 
role  of  variance  inflation  for  such  a  large  ensemble  un¬ 
der  the  perfect  model  scenario  is  to  compensate  for  the 
effects  of  nonlinearities  in  the  model  dynamics. 

a.  Terminology 

We  introduce  the  notation  eb  =  (sb,Bb, . . .  ,ebM)  for 
the  random  vector,  whose  components  are  the  back¬ 
ground  errors  for  the  different  state  variables.  In  spatial 
statistics,  we  say  that  the  process  is  nonstationary  in 
space,  when  pbv  =  Cov{e*,e*}  depends  on  the  two  lo¬ 
cations  a  and  v.  When  the  mean  is  constant  and  the 
covariance  depends  on  the  locations  only  through  the 
difference  of  the  two  locations,  that  is,  Cf  (it  —  v)  = 
Cov{Bb,  eb}  for  an  appropriate  (i.e.,  positive  definite) 
function  C\,  then  we  call  the  process  stationary. 


5  We  also  carried  out  experiments  in  which  we  observed  every 
third  location,  but  because  the  results  were  qualitatively  similar  to 
those  for  observing  all  locations,  we  do  not  report  the  results  for 
that  setting. 
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Finally,  when  the  mean  is  constant  and  the  covariance 
depends  on  the  locations  only  through  the  distance 
between  the  two  locations,  that  is,  C2(\u  —  v\)  = 
Cov{gy,  e*}  for  an  appropriate  function  C2,  we  call  the 
process  isotropic. 

One  option  to  estimate  the  covariance  function  is  to 
use  a  parametric  covariance  model.  There  are  various 
parametric  covariance  models  that  are  isotropic,  for 
example,  the  exponential  function  C2(x)  =  ae~xlP,  the 
Gaussian  function  C’2(x)  =  o,e~(x'^  ,  or  the  Matern  func¬ 
tion  C,(x)  =  a(xlfi)vK,v(xlfi),  where  a ,  [3 ,  v  are  positive 
parameters  and  K,  is  the  modified  Bessel  function  (x  > 
0).  Parametric  covariance  models  are  also  available  for 
certain  nonstationary  processes  (Paciorek  and  Schervish 
2006;  Jun  and  Stein  2007).  In  some  situations,  we  may 
model  a  nonstationary  process  as  a  sum  of  several  in¬ 
dependent,  locally  stationary  processes  with  simple 
parametric  stationary  (or  isotropic)  covariance  func¬ 
tions  (Fuentes  2001).  Parametric  methods,  however,  in 
general,  are  not  as  flexible  as  nonparametric  methods, 
which  do  not  make  presumptions  about  the  general 
shape  of  the  covariance  function.  For  instance,  the 
nonstationarity  of  the  background  covariance  structure 
for  the  Lorenz-96  model  is  complex  and  there  is  no 
obvious  flexible  parametric  covariance  model  to  fit  the 
covariance  structure.  This  complexity  is  illustrated  with 
Fig.  1,  which  shows  the  sample  covariance  function  for 


various  ensemble  sizes  in  the  Lorenz-96  model:  the 
sample  covariance  structure  is  not  symmetric  around  the 
center  and  has  a  strong  dependence  on  the  location. 
These  are  clear  signs  of  strong  anisotropy  and  non¬ 
stationarity  of  the  background  error.  This  result  moti¬ 
vates  us  to  search  for  an  appropriate  nonparametric 
method. 

We  use  a  kernel  smoothing  approach  as  our  non¬ 
parametric  estimation  method,  which  requires  the  se¬ 
lection  of  a  kernel  function  and  a  bandwidth.  A  kernel 
is  a  nonnegative  function,  which  is  symmetric  around 
zero,  and  decreases  monotonically  as  \u  —  v\  goes  to  <»; 
for  example,  G{u  —  v)  =  exp[— (u  —  v)2],  is  a  Gaussian 
kernel  function.  There  are  various  statistical  methods 
to  choose  optimal  kernels  and  bandwidths,  and  in  many 
applications  the  effect  of  the  selection  of  the  kernel  is 
insignificant  compared  to  the  selection  of  the  bandwidth.6 
In  what  follows,  we  first  develop  a  nonparametric  method 
to  estimate  the  covariance  function  for  the  stationary 
case  and  then  we  extend  the  method  to  the  nonstationary 
case. 

b.  Stationary  estimate 

Suppose  a  spatial  process  is  stationary.  Then,  we  es¬ 
timate  phuv  =  C)  (u  —  v)  by  the  kernel  estimator  given  in 
Hall  and  Patil  (1994): 


K  M  M 


^XXXg 

-b  _K  k=1  i=1  7=1 


\(u-v)  -  (i-j) I 


h 


[xfw  -  Xb(k)][X-(k)  -  Xb(k)] 


Puv  = 


M  M 

XXg 

i= 1  7=1 


|(m-u) -(;-/)| 


(13) 


Here,  h  is  a  bandwidth,  G(  )  is  a  kernel  function,  and 
Xb(k)  is  the  spatial  average,  X =  (l/Mj^^xf^,  of 
the  Arth  ensemble  perturbation  \h  xK  The  estimator  in 
Eq.  (13)  is  actually  isotropic  in  the  sense  that  phuv  =  phvu 
for  all  u  and  v.  As  Hall  and  Patil  (1994)  note,  Eq.  (13) 
may  not  give  a  positive  definite  function  and  they  sug¬ 
gest  a  modified  version  of  Eq.  (13)  to  ensure  positive 
definiteness  and  to  achieve  nice  mathematical  proper¬ 
ties  of  the  estimator.  Here  we  do  not  worry  about  the 
positive  definiteness  of  the  estimate  provided  by  Eq. 
(13),  as  it  is  not  the  final  scheme  we  intend  to  use. 
Equation  (13)  can  be  easily  adapted  for  d-dimensional 
case  ( d  >  1)  by  simply  letting  i  and  j  inside  of  the  pa¬ 
rentheses  be  the  coordinates  of  the  zth  and  jth  locations 
on  the  d-dimensional  domain. 

Figure  2  shows  the  estimated  covariance  function 
obtained  with  Eq.  (13)  for  u  =  20  and  v  =  1, . . . ,  40. 


Because  of  the  stationarity  assumption,  the  shape  of 
the  estimated  covariance  function  is  the  same  for  all 
u  =  1,...,  40.  Moreover,  the  estimated  covariance 
function  is  symmetric  around  the  center.  Each  row 
shows  the  result  for  a  given  bandwidth.  The  curves  in 
Fig.  2  do  not  match  the  sample  covariances  in  Fig.  1, 
further  suggesting  that  the  assumption  of  stationarity 
for  the  background  error  in  the  Lorenz-96  model  is  not 
appropriate. 

c.  Nonstationary  estimate 

We  now  extend  Eq.  (13)  for  the  nonstationary  case, 
estimating  phuv  with 


6  For  more  details  on  kernel  estimation  methods,  see  Fan  and 
Yao  (2003). 
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Sample  covariance  between  u=3  and  v 


V 

Fig.  1.  Sample  covariances  between  the  two  locations  u  and  v  using  K  ensemble  members  for 

u  =  3,  30  and  v  —  1, . . . ,  40. 


K  M  M 

4  X  ^^G.(u-h)GXv-h)[X.{k)  -Xb(k)][X.(k)  ~Xb(k)] 


t'uv  M  M 

XXc, 

i= 1  7=1 

where  G;(m;  h)  and  Gj(v:  h)  are  kernel  functions  with 
a  fixed  bandwidth  h.  See  the  appendix  for  the  sketch  of 
proof  for  the  positive  definiteness  of  Eq.  (14).  Since  the 
kernel  functions  depend  on  the  two  locations,  u  and  v, 
Eq.  (14)  provides  a  flow-dependent  estimate  of  the  co- 
variance.  We  may,  for  instance,  use  the  Gaussian  kernel 


(14) 

u-h)Gj(v,h) 


functions  G(u\h)  =  exp[—  (It— u\lh)2]  and  G(v;h)  = 
exp[—  (\j—v\/h)  ].  We  note  that  the  computational  cost 
of  Eq.  (14)  can  be  greatly  reduced  by  using  a  compactly 
supported  kernel  for  G,  since  it  reduces  the  number  of 
terms  in  the  double  summation  over  i  and  j.  Figure  3 
shows  the  estimate  of  the  covariance  function  pbuv  using 
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Fig.  2.  Fitted  covariance  between  the  locations  u  =  20  and  v  =  1, ....  40  using  the 
nonparametric  approach  under  stationarity  assumption. 


Eq.  (14).  The  covariance  curve  from  P5000  is  displayed 
(black  curve)  for  comparison.  The  estimated  co- 
variance  function  changes  with  the  location  u  and 
provides  a  better  fit  to  the  large  scale  structure  of  the 
sample  covariance  (Fig.  1)  than  the  stationary  estimate 

(Fig-  2). 

d.  Adaptive  bandwidth 

As  seen  before,  both  the  stationary  and  non¬ 
stationary  estimates  of  the  covariance  require  the  se¬ 
lection  of  a  bandwidth  h:  a  larger  value  of  h  results  in 
a  smoother  estimate  of  the  covariance,  because  it  in¬ 
volves  a  stronger  averaging  of  the  sample  errors  over 
the  neighboring  locations.  On  the  one  hand,  when  h 
goes  to  zero,  the  estimate  provided  by  Eq.  (14)  be¬ 
comes  similar  to  the  estimate  provided  by  the  sample 
covariance  estimate,  except  that  in  Eq.  (14)  we  divide 
the  sums  by  K  instead  of  K  —  1  and  filter  the  back¬ 
ground  error  perturbation  \h<k>  with  its  mean  Xh(k> 
over  the  locations.  On  the  other  hand,  when  h  is  large, 
the  estimated  covariance  is  smooth  and,  in  the  extreme 
case,  it  becomes  constant.  This  is  because  when  h  is 
large,  the  kernel  values  in  Eqs.  (13)  and  (14)  are  almost 
constant  for  all  ij,  u,  and  v.  See  the  bottom  row  in  Fig.  3 
for  an  example. 


Because  we  expect  the  signal-to-noise  ratio  to  be 
lower  at  larger  distances  \u  —  u|,  we  introduce  a  band¬ 
width  h  =  h(\u  —  u|)  that  increases  with  the  distance.7  In 
particular,  to  make  h  smoothly  varying  with  the  dis¬ 
tance,  we  let  h  =  h\  exp{(|w  —  v\ lh2)2}.  Thus,  we  make 
the  bandwidth  adaptive  at  the  price  of  replacing  the 
single  bandwidth  parameter  h  with  a  pair  of  parameters, 
hi  and  h2 .  Figure  4  shows  examples  of  the  dependence  of 
the  adaptive  bandwidth  on  the  two  parameters  (Fig.  4, 
top  panel)  and  a  comparison  of  the  corresponding  co- 
variance  estimates  with  the  sample  covariance  using  20 
ensemble  members  (Fig.  4,  bottom  panel).  The  two  fig¬ 
ures  use  the  same  color  scale.  For  the  bottom  panel,  we 
also  display  sample  covariances  using  20  and  5000  en¬ 
semble  members  for  comparison.  When  hi  =  1 0  7  and 
h2  =  1,  the  adaptive  bandwidth  is  about  10~7  at  all  dis¬ 
tances;  thus,  the  nonparametric  estimate  becomes  sim¬ 
ilar  to  the  sample  covariance.  The  results  for  h\  =  0.01 
and  h2  =  2  are  fairly  similar.  For  the  other  choices  of  hi 
and  h2  the  bandwidth  increases  with  the  distance  and  the 


7  Localization  of  the  sample  covariance  is  equivalent  to  using  an 
h,  which  is  nearly  zero  within  the  localization  length  and  large 
outside  the  localization  length. 


3044 


MONTHLY  WEATHER  REVIEW 


Volume  139 


bandwith  0.05 


bandwith  0.05 


bandwidth  0.1 


bandwidth  0.1 


bandwidth  0.3 


bandwidth  0.3 


- K=5  --  K=50 

K=10  K=100 

-  K=20 

u=30 

V  V 

Fig.  3.  Fitted  covariance  between  the  locations  u  and  v  using  the  nonparametric  approach  for  the  nonstationary  case. 
The  two  chosen  u  values  are  the  same  as  in  Fig.  1.  The  black  curve  gives  Psooo. 


corresponding  fitted  covariance  values  are  close  to  zero 
after  a  short  distance  (Fig.  4). 

5.  Quantitative  comparison 

Next,  we  compare  the  covariance  estimates  provided 
by  the  nonparametric  scheme  with  those  based  on  the 
sample  and  the  localized  sample  covariance  at  a  fixed 
time  step.  For  this  comparison,  we  again  use  P5000  as 


a  proxy  for  the  true  Pb .  We  measure  the  accuracy  of  all 
our  estimates  to  the  true  Pb  by  computing  the  Frobe- 
nius  norm  of  the  difference  between  the  estimates  and 

- 5000s 

Pb  .  For  a  given  ensemble  size  K  we  randomly  select 
200  sets  of  K  ensemble  members  from  the  5000-member 


8  For  a  matrix  A  =  {a..},  the  Frobenius  norm  is  defined  as 

l|A||F  = 
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adaptive  bandwidth 


Fig.  4.  A  few  adaptive  bandwidth  examples  and  the  corresponding  covariance  fits  with  20 
ensemble  members,  (top)  The  shape  of  four  adaptive  bandwidths  against  the  distance  for  the 
four  combinations  of  hi  and  h2  values,  (bottom)  The  corresponding  fitted  covariance  values 
between  the  locations  u  =  3  and  v  =  1, ....  40  with  20  ensemble  members.  Same  scale  is  used  for 
the  two  figures  for  the  adaptive  bandwidth  case.  For  the  bottom  panel,  we  have  two  additional 
curves  displaying  sample  covariances  with  20  and  5000  ensemble  members  for  comparison. 


-b 

ensemble  from  which  P5000  is  derived.  We  compute  the 
Frobenius  norm  of  the  difference  between  the  estimates 
and  P5000  for  each  of  the  200  sets  of  ensembles  and  char¬ 
acterize  the  accuracy  of  the  estimates  from  the  distri¬ 
bution  of  the  200  values  of  the  difference. 

Results  are  shown  for  various  values  of  the  param¬ 
eters  hi  and  h2  in  Fig.  5.  The  “optimal”  pair  of  values 
is  hi  =  0.05  and  h2  =  0.8  in  terms  of  the  median,  and 
hi  =0.1  and  h2  =  1.5  in  terms  of  the  mean.  For  lo¬ 
calization,  we  use  the  fifth-order  kernel  function  de¬ 
fined  by  Eq.  (4.10)  of  Gaspari  and  Cohn  (1999),  which 
is  the  most  widely  used  localization  function  in  the 
EnKF  literature.  This  function  has  a  single  parameter 
c,  which  controls  the  localization  length:  while  the 
function  formally  becomes  zero  at  distance  2c,  the 
filtered  covariances  are  already  nearly  zero  at  dis¬ 
tance  c.  Figure  6  shows  the  median  and  mean  of  the 
Frobenius  norm  for  different  values  of  c.  For  a  20- 
member  ensemble,  the  “optimal”  localization  length, 
with  respect  to  the  Frobenius  norm,  is  smaller  than  c  = 
24,  the  value  that  was  reported  to  be  optimal  with  re¬ 
spect  to  the  analysis  accuracy  for  a  10-member  ensemble 
in  Whitaker  and  Ftamill  (2002).  Also,  there  is  a  notice¬ 
able  difference  between  the  “optimal”  localization  length 


with  respect  to  the  median,  c  =  7.5,  and  to  the  mean, 
c  =  5. 

The  covariance  estimates  provided  by  the  non- 
parametric  scheme  and  the  localization  are  compared  for 
the  “optimal”  values  of  the  parameters  of  the  two 
schemes  (Fig.  7).  Overall,  with  respect  to  the  mean,  the 
nonparametric  scheme  provides  the  most  accurate  esti¬ 
mate  except  when  the  ensemble  size  is  5.  The  advantage  of 
this  scheme  over  the  sample  covariance  and  the  localiza¬ 
tion  with  c  =  24  is  particularly  large  for  the  small  en¬ 
sembles  ( K  <C  50).  The  advantage  of  the  nonparametric 
scheme  over  localization  with  c  =  5  is  much  more  modest. 
In  addition,  with  respect  to  the  median,  the  localization 
with  c  =  5  outperforms  the  nonparametric  scheme  when 
K  <  50.  The  system  used  here  gives  a  highly  non¬ 
stationary  covariance  structure  for  the  background  er¬ 
rors.  The  nonparametric  scheme  would  be  superior  if 
the  true  covariance  structure  was  stationary  or  isotropic 
[Eq.  (13)]. 

6.  Analysis  experiments  with  filtered  covariances 

In  section  5  we  compared  the  accuracy  of  the  filtered 
covariance  estimates  for  a  single  analysis  time.  In  this 
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Fig.  5.  Frobenius  norm  of  the  (left)  median  and  (right)  mean  of  the  differences  between  the  fitted  covariances  from 
the  nonparametric  scheme  and  P5000  for  various  hi  and  h2  values.  The  median  and  mean  are  computed  from  200 
independent  trials. 


section,  we  carry  out  analysis  experiments  with  a  K  =  20 
member  ensemble  using  the  filtered  estimates  of  the 
background  covariance  in  the  computation  of  the  Kalman 
gain.  The  filter  is  run  for  1500  time  steps  and  statistics 
are  computed  based  on  the  last  1000  time  steps. 

a.  Verification  methods 
We  measure  the  accuracy  of  the  analysis  with 


A  = 


1 

Tooo 


1000 

I 


n= 1 


(15) 


the  time-mean  of  the  root-mean-square  error  over  all 
model  variables.  Here,  eam(tn)  is  the  analysis  error  at  grid 
point  m  at  time  tn,  where  eam(tn)  =  xam(tn)  ~  x‘m(tn). 
Using  the  mean  of  a  time  series  of  root-mean-square 


Median 


mean 


localization  length  localization  length 

Fig.  6.  Frobenius  norm  of  the  differences  between  the  fitted  covariances  from  localization  and  P5000  for  various 
localization  lengths.  We  use  the  same  ensemble  members  for  each  trial  at  a  given  ensemble  size  as  in  Fig.  5.  (left)  The 
median  from  the  200  independent  trials;  (right)  the  mean.  The  minimum  values  for  each  ensemble  size  are  given  by 
the  triangles. 
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Fig.  7.  Frobenius  norm  of  the  differences  between  the  fitted 
covariances  and  P5000  for  various  ensemble  sizes.  We  use  the  same 
ensemble  members  for  each  trial  at  a  given  ensemble  size  as  in 
Fig.  5.  The  box  plots  indicate  the  distribution  of  differences  found 
in  the  200  test  trials  in  the  following  way:  The  lower  and  upper 
bounds  of  the  box  respectively  give  the  25  th  and  75th  percentile  of 
differences  (as  measured  by  the  Frobenius  norm)  found  in  the  200 
trials.  The  thick  line  going  across  the  interior  of  the  box  gives  the 
median  of  the  differences  found  in  the  200  trials.  The  whiskers  or 
thin  lines  depend  on  the  interquartile  range  (IQR)  that  is  pre¬ 
cisely  equal  to  the  vertical  length  of  the  box.  The  whiskers  extend 
to  the  most  extreme  values  (differences  obtained  in  the  200  trials) 
which  are  no  more  than  1.5  IQR  from  the  box.  The  solid  lines 
connect  the  means  of  the  differences  in  the  200  trials  at  each 
ensemble  size. 


errors  is  a  standard  practice  in  numerical  weather  pre¬ 
diction  to  measure  the  quality  of  an  analysis/forecast 
system:  the  root-mean-square  error  computed  over  all 
grid  points  characterizes  the  accuracy  of  the  state  es¬ 
timate  at  a  given  time  by  a  single  number  and  the  time 
mean  over  an  extended  time  period  can  be  easily 
computed  retrospectively  based  on  the  archived  root- 
mean-square  values.  (Computing  the  root-mean-square 
over  time  and  space  would  require  processing  all  grid- 
point  values  at  all  times  at  the  end  of  the  verification 
period.) 

To  characterize  the  estimates  of  the  background  er¬ 
ror,  we  examine  the  estimates  of  pb  for  a  fixed  location 
u.  Since  p*  is  a  probabilistic  variable,  its  estimates 
cannot  be  evaluated  for  a  single  analysis  time  tn,  for 
which  only  a  single  realization,  sb(tn)  =  xfe(?n)  —  x'(fn), 
of  the  background  error  is  available.  {Here  we  use  the 
notation  eb(tn)  =  [ef(*„), e\ (f„),  . . .  ,ebM(tn)]}.  Thus,  we 
use  the 
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(17) 


the  time  mean  of  the  sample  covariance  matrix  P b(t  ). 
For  a  properly  constructed  ensemble,  sb(tn)  and  the 
ensemble  perturbations  (X6(A):  k  =  1,  . . .  ,  K}  are  from 
the  same  distribution;  thus 


K- 1 

Pbc  -  -1—  X  Pbtc  -  Pbtc,  and  (18) 

K  - 1  k=i 

Pbc  -  Pbtc,  (19) 

where  pbc  and  pbtc  are  the  wth  column  of  Pbc  and  Pbtc, 
respectively.  Similar  arguments  can  be  made  to  show 
that  the  filtered  estimate  should  also  satisfy 

Pbc~pbtc,  (20) 

where  pbc  is  the  wth  column  of  P((  . 

We  emphasize  that  Eqs.  (19)  and  (20)  do  not  provide 
information  about  the  accuracy  of  the  estimates  pfc(t„) 
and  p b(t  )  at  a  single  analysis  time  tn\  instead,  these  re¬ 
lations  can  be  used  to  verify  the  consistency  between 
a  time  series  of  the  estimated  background  error  co- 
variance  matrix  and  a  time  series  of  the  true  background 
error.  Such  consistency  is  a  necessary,  but  not  sufficient, 
condition  for  P h(t  )  or  P "(r  )  to  be  an  accurate  repre¬ 
sentation  of  P  ’(t„)  at  the  different  analysis  times. 

b.  Dependence  of  the  results  on  the  filtering 
parameters 

To  establish  a  baseline  of  the  analysis  error,  against 
which  we  can  measure  the  effectiveness  of  covariance 
filtering,  we  first  carry  out  an  analysis  experiment  using 
the  unfiltered  sample  covariance  matrix  P.  We  find  that 
A  takes  its  minimum  value  of  0.23  when  p  =  1.06;  using 
smaller  values  of  p  makes  the  filter  unstable,  while  in¬ 
creasing  p  leads  to  increasingly  larger  values  of  A. 

First,  we  filter  the  sample  covariance  estimates  using 
localization.  Whitaker  and  Hamill  (2002)  studied,  in 
detail,  the  sensitivity  of  the  analysis  error  to  the  locali¬ 
zation  parameter  c  and  the  variance  inflation  coefficient 
p  for  K  =  10.  They  found  that  the  root-mean-square 
error  in  the  analysis  mean  xfl  took  its  smallest  value  of 
0.2  for  c  =  24  and  p  =  1.03. 9  Using  the  same  values  of 


estimate  of  the  “climatological”  value  of  the  true  back-  - 

ground  error  covariance  matrix:  E{sb(tn)[eb(tn)\  }.  We  9  The  correct  figure  to  support  their  result  can  be  found  in  the 
compare  Phtc  with  corrigendum  of  Whitaker  and  Hamill  (2002). 
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Fig.  8.  The  grayscale  shades  show  the  value  of  A  for  p  =  1.025  as  a  function  of  the  parameters 

hi  and  h2. 


c  and  p  for  K  =  20,  we  obtain  a  root-mean-square  error 
of  A  =  0.19.  This  value  indicates  an  error  reduction  of 
0.04  (about  17%)  compared  to  the  case  when  no  co- 
variance  filtering  is  used.  For  c  =  5,  the  value  that  we 
found  optimal  for  a  single  analysis  time  in  section  5,  the 
minimum  value  of  A  is  0.22,  which  can  be  obtained  using 
p  >  1.025.  That  is,  the  performance  of  the  EnKF  for  c  = 
5  is  clearly  inferior  to  that  for  c  =  24,  despite  our  earlier 
result  that  c  =  5  provides  a  more  accurate  estimate  at 
a  single  analysis  time. 

Finally,  we  investigate  the  sensitivity  of  the  analysis 
error  to  the  parameters  h\  and  h2  using  the  non- 
parametric  scheme  with  adaptive  bandwidth  to  filter 
the  covariance.  We  show  results  for  p  =  1.025,  the 
value  of  the  variance  inflation  we  found  to  provide  the 
smallest  value  of  A.  The  results  are  summarized  in  Fig.  8. 
In  this  figure,  the  white  area  indicates  the  parameter 
range  where  the  filter  fails,  indicated  by  a  value  of  A, 
which  is  larger  than  one,  the  root-mean-square  of  the 
observation  error.  The  typical  value  of  A  in  this  region 
is  between  3.0  and  4.0,  which  is  similar  to  the  standard 
deviation  of  the  temporal  changes  in  the  model  vari¬ 
ables.  An  interesting  feature  in  Fig.  8  is  the  sharp 
boundary  between  the  parameter  ranges  where  the 
filter  fails  and  where  the  analysis  error  is  the  smallest. 
The  parameter  range  where  the  performance  of  the 
EnKF  is  nearly  optimal  (marked  by  blue  shades)  is 
wide:  a  large  value  of  h2  cannot  be  used  with  a  small 
value  of  hi,  but  once  h2  is  larger  than  about  0.03,  the 


analysis  error  becomes  insensitive  to  the  choice  of  h2. 
In  essence,  an  increase  of  the  bandwidth  with  distance  is 
important  only  when  the  bandwidth  at  zero  distance 
is  small.  In  the  blue  region,  the  root -mean-square  error  is 
about  A  =  0.2,  lower  than  that  for  localization  with  c  =  5, 
but  slightly  worse  than  that  for  localization  with  c  =  24. 

c.  Consistency  between  the  estimated  and  the 
true  errors 

Figure  9  illustrates  the  level  of  consistency  between 
the  sample  covariance  and  the  covariance  for  the  true 
background  error.  While  the  general  shape  of  the  co- 
variance  function  is  captured  well  by  the  sample  co- 
variance,  at  short  distances,  | u  —  u|  <  2  (18  <  v  <  22), 
the  absolute  value  of  the  covariance  is  somewhat 
overestimated.  This  overestimation  can  be  easily  cor¬ 
rected  by  reducing  the  variance  inflation,  but  reduc¬ 
ing  the  variance  inflation  quickly  leads  to  a  loss  of 
the  stability  of  the  EnKF.  At  longer  distances,  beyond 
\u  —  v\  >  5,  the  time  mean  of  the  sample  covariance  is 
about  zero,  while  the  time  mean  of  the  true  error  is 
small,  but  not  zero  at  most  distances.  We  note  that  the 
near-zero  values  at  the  larger  distances  for  the  sample 
covariance  are  due  to  the  filtering  effect  of  time  aver¬ 
aging,  as  we  observe  relatively  large  instantaneous  es¬ 
timates  of  the  covariance  (results  not  shown)  at  those 
distances.  This  result  suggests  that  the  relatively  large 
instantaneous  values  at  the  larger  distances  are  domi¬ 
nated  by  statistical  fluctuations. 
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Fig.  9.  The  components p|^,v  =  1, ...  ,40,  of  p^c  (gray  dashed)  and 
pk’c,  v  =  1, . . . ,  40  of  p“c  (light  gray  solid)  for  u  =  20. 


We  show  results  on  the  consistency  of  the  estimates 
of  the  covariance  for  c  =  5  (Fig.  10)  and  c  =  24  (Fig.  11). 
In  these  figures,  both  the  sample  and  the  localized  esti¬ 
mates  slightly  overestimate  the  variance.  As  in  the  case 
of  the  results  shown  in  Fig.  9,  this  overestimation  can  be 
corrected  by  reducing  the  variance  inflation,  but  re¬ 
ducing  the  variance  inflation  makes  the  filter  unstable. 
At  distances  \u  —  v\  <  2,  the  consistency  is  clearly  better 
for  c  =  24  than  for  c  =  5,  as  for  the  latter  choice  of  c,  the 
magnitude  of  the  negative  covariance  at  \u  —  v\  =  2  is 
underestimated.  The  difference  in  the  accuracy  of  the 
estimates  at  \u  —  v\  =2  may  explain  the  better  perfor¬ 
mance  of  the  EnKF  for  c  =  24  than  for  c  =  5.  For  c  =  5,  at 
distances  \u  —  v\  >  10,  the  filtered  estimate  is  zero,  while 
the  sample  covariance  shows  some  distance-dependent 
variation.  For  c  =  24,  the  sample  and  the  filtered  co- 
variance  shows  a  high  level  of  consistency  with  each 
other.  This  is  an  indication  that  the  localized  covariance 
with  c  =  24  leads  to  a  better  consistency  between 
the  localized  covariance  estimate  and  the  ensemble 
perturbations. 

We  show  results  for  two  different  choices  of  the 
bandwidth  parameters  of  the  nonparametric  estima¬ 
tion  scheme:  =  0.05,  h2  =  1.5  (Fig.  12)  and  h1  =  0.11, 
h2  =  1.5  (Fig.  13).  A  common  feature  of  these  figures  is 
that  the  consistency  between  the  sample  and  the  fil¬ 
tered  estimates  is  lower  than  for  the  localization- 
based  filtering  for  c  =  24.  (We  recall  that  localization 
with  c  =  24  provides  the  most  accurate  analysis  among 
all  filtering  schemes  tested  here.)  Thus,  we  conclude 
that  the  scheme  that  provides  the  most  accurate 
analysis,  on  average,  is  the  scheme  for  which  the 
sample  and  the  filtered  estimates  are  most  consistent 
with  each  other  (localization  with  c  =  24).  This  is  also 
the  scheme  that  provides  the  best  consistency  with  the 


Fig.  10.  The  components  v  =  1, .  . .  ,40  of  pj]c  (gray 
dashed)  and  pj)*c,n  =  1, . . .  ,40,  of  p[),c  (light  gray  solid)  and 
n  =  1, ....  40,  of  p])c  for  localization  with  c  =  5  for  u  =  20. 


true  errors  for  short  distances.  The  results  of  this 
section  suggest  that  a  better  metric  of  covariance  fil¬ 
tering  skill  would  be  one  that  combined  a  measure  of 
closeness  to  the  sample  covariance  matrix  for  a  very 
large  ensemble  with  a  measure  of  similarity  between 
the  climatological  averages  of  the  filtered  and  sample 
covariance. 

7.  Conclusions 

In  this  paper,  we  investigated  the  effects  of  filtering 
the  sample  covariances  used  in  the  computation  of  the 
Kalman  gain  in  an  EnKF  on  the  accuracy  of  the  esti¬ 
mates  of  the  background  error  covariance.  We  consid¬ 
ered  two  particular  approaches  to  filter  the  sample 


Fig.  11.  The  components  pj^,  v=  1, ...  ,40,  of  pj)0  (gray  dashed) 

and  p^,v  =  1, _ 40  of  pj)tc  (light  gray  solid)  and  p]j,  v  =  1, .... 40 

of  pjf  for  localization  with  c  =  24  for  u  =  20. 
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Fig.  12.  The  components  pfc,  v  =  1, . . .  ,40  of  p]jc  (gray 
dashed)  and  p jjjf,  v  =  1, . . .  ,40  of  p]j,c  (light  gray  solid)  and 
v  =  1, ...  ,40  ofp|;c  for  the  nonparametric  scheme  with  ht  = 
0.05  and  /?2  =  1.5  for  u  =  20. 


Fig.  13.  The  components p^,  v  =  1, . . .  ,40  of  p)jc  (gray  dashed) 
and  p|£c,  v  =  1, . . .  ,40  of  p|fc  (light  gray  solid)  and  v  — 

1 _ ,40  ofp„c  for  the  nonparametric  scheme  with  hi  =  0.11  and 

/12  =  1.5  for  u  =  20. 


covariance:  the  more  traditional  approach  of  covariance 
localization  and  a  new  kernel  smoothing  method  with 
variable  bandwidth  to  obtain  nonparametric  estimates 
of  the  covariances. 

We  found  that  for  a  single  analysis  time,  the  non¬ 
parametric  scheme  provided  the  overall  most  accurate 
estimate  of  the  covariances  and  that  localization 
provided  more  accurate  estimates  with  short  locali¬ 
zation  length  than  with  long  localization  length.  We 
also  found,  however,  that  when  the  analysis  is  cycled, 
localization  with  long  localization  length  provided 
more  accurate  analysis  in  the  root-mean-square  sense 
than  the  nonparametric  scheme  or  localization  with 
short  localization  length.  We  explained  this  result  with 
the  better  consistency  between  the  covariance  estimates, 
the  true  background  error  covariance,  and  the  ensemble 
perturbations  that  represent  the  background  uncertainty. 
Our  results  suggest  that  preserving  such  consistency  is 
important. 

Finally,  we  note  that  the  Lorenz-96  model,  with  its 
rapidly  changing  background  covariance  between  loca¬ 
tions,  poses  a  considerable  challenge  to  the  covariance 
estimation  methods.  It  is  plausible  that  some  of  our 
findings  would  not  hold  for  a  more  realistic  model  in 
which  the  background  error  covariance  changes  in  a 
smoother  way.  In  such  a  model,  distinguishing  between 


the  spatially  rapidly  changing  sampling  noise  and  the 
smoother  changing  true  covariance  should  be  easier, 
which  we  expect  to  benefit  the  new  kernel  smoothing 
method  more  than  localization. 

Acknowledgments.  Mikyoung  Jun,  Marc  G.  Genton, 
and  Fuqing  Zhang  acknowledge  the  support  from  the 
National  Science  Foundation  (ATM  0620624).  Mikyoung 
Jun’s  research  is  also  supported  by  NSF  Grant  DMS- 
0906532.  Istvan  Szunyogh  acknowledges  the  support 
from  NSF  (ATM  0935538)  and  ONR  (N000140910589). 
Marc  Genton’s  research  is  supported  by  NSF  DMS- 
1007504.  Fuqing  Zhang  acknowledges  the  support  from 
ONR  Grant  N000140410471.  Craig  H.  Bishop  acknowl¬ 
edges  support  from  ONR  Project  Element  0602435N, 
Project  Number  BE-435-003.  The  authors  are  grateful  to 
Herschel  Mitchell  (the  Editor),  Andrew  Tangborn,  and 
one  anonymous  reviewer  for  valuable  comments. 

APPENDIX 

Proof  of  the  Positive  Definiteness  of  Eq.  (14) 

We  give  a  brief  sketch  of  the  proof  for  the  positive 
definiteness  of  the  covariance  function  in  Eq.  (14). 

It  is  sufficient  to  show  the  positive  definiteness  of 


M  M 

XXG,(«;i)C.(^)[/fl) 

i=l/=l  ' 


r  M  M 


Xb(')][X^)  -  xh0>]  /£  Xg.(M;/)G  (y-h). 

’  /  i—l  j—1  1 


The  idea  is  similar  to  the  proof  given  by  Eq.  (4)  of  Paciorek  and  Schervish  (2006). 
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